18 research outputs found
CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
Open Information Extraction (OpenIE) methods extract (noun phrase, relation
phrase, noun phrase) triples from text, resulting in the construction of large
Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in
such Open KBs are not canonicalized, leading to the storage of redundant and
ambiguous facts. Recent research has posed canonicalization of Open KBs as
clustering over manuallydefined feature spaces. Manual feature engineering is
expensive and often sub-optimal. In order to overcome this challenge, we
propose Canonicalization using Embeddings and Side Information (CESI) - a novel
approach which performs canonicalization over learned embeddings of Open KBs.
CESI extends recent advances in KB embedding by incorporating relevant NP and
relation phrase side information in a principled manner. Through extensive
experiments on multiple real-world datasets, we demonstrate CESI's
effectiveness.Comment: Accepted at WWW 201
MASR: Metadata Aware Speech Representation
In the recent years, speech representation learning is constructed primarily
as a self-supervised learning (SSL) task, using the raw audio signal alone,
while ignoring the side-information that is often available for a given speech
recording. In this paper, we propose MASR, a Metadata Aware Speech
Representation learning framework, which addresses the aforementioned
limitations. MASR enables the inclusion of multiple external knowledge sources
to enhance the utilization of meta-data information. The external knowledge
sources are incorporated in the form of sample-level pair-wise similarity
matrices that are useful in a hard-mining loss. A key advantage of the MASR
framework is that it can be combined with any choice of SSL method. Using MASR
representations, we perform evaluations on several downstream tasks such as
language identification, speech recognition and other non-semantic tasks such
as speaker and emotion recognition. In these experiments, we illustrate
significant performance improvements for the MASR over other established
benchmarks. We perform a detailed analysis on the language identification task
to provide insights on how the proposed loss function enables the
representations to separate closely related languages
InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions
Most existing knowledge graphs suffer from incompleteness, which can be
alleviated by inferring missing links based on known facts. One popular way to
accomplish this is to generate low-dimensional embeddings of entities and
relations, and use these to make inferences. ConvE, a recently proposed
approach, applies convolutional filters on 2D reshapings of entity and relation
embeddings in order to capture rich interactions between their components.
However, the number of interactions that ConvE can capture is limited. In this
paper, we analyze how increasing the number of these interactions affects link
prediction performance, and utilize our observations to propose InteractE.
InteractE is based on three key ideas -- feature permutation, a novel feature
reshaping, and circular convolution. Through extensive experiments, we find
that InteractE outperforms state-of-the-art convolutional link prediction
baselines on FB15k-237. Further, InteractE achieves an MRR score that is 9%,
7.5%, and 23% better than ConvE on the FB15k-237, WN18RR and YAGO3-10 datasets
respectively. The results validate our central hypothesis -- that increasing
feature interaction is beneficial to link prediction performance. We make the
source code of InteractE available to encourage reproducible research.Comment: Accepted at AAAI 202